Experimenting with Automatic Text Summarisation for Arabic

نویسندگان

  • Mahmoud El-Haj
  • Udo Kruschwitz
  • Chris Fox
چکیده

The volume of information available on the Web is increasing rapidly. The need for systems that can automatically summarize documents is becoming ever more desirable. For this reason, text summarization has quickly grown into a major research area as illustrated by the DUC and TAC conference series. Summarization systems for Arabic are however still not as sophisticated and as reliable as those developed for languages like English. In this paper we discuss two summarization systems for Arabic and report on a large user study performed on these systems. The first system, the Arabic Query-Based Text Summarization System (AQBTSS), uses standard retrieval methods to map a query against a document collection and to create a summary. The second system, the Arabic Concept-Based Text Summarization System (ACBTSS), creates a query-independent document summary. Five groups of users from different ages and educational levels participated in evaluating our systems. Each group had 300 individuals. We also performed a comparative evaluation with a commercial Arabic summarization system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media

Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...

متن کامل

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

A ranking approach to summarising Twitter home timelines

The rise of social media services has changed the ways in which users can communicate and consume content online. Whilst online social networks allow for fast and convenient delivery of knowledge, users are prone to information overload when too much information is presented for them to read and process. Automatic text summarisation is a tool to help mitigate information overload. In automatic ...

متن کامل

University of Essex at the TAC 2011 MultiLingual Summarisation Pilot

We present the results of our Arabic and English runs at the TAC 2011 Multilingual summarisation (MultiLing) task. We participated with centroid-based clustering for multidocument summarisation. The automatically generated Arabic and English summaries were evaluated by human participants and by two automatic evaluation metrics, ROUGE and AutoSummENG. The results are compared with the other syst...

متن کامل

Using a Keyness Metric for Single and Multi Document Summarisation

In this paper we show the results of our participation in the MultiLing 2013 summarisation tasks. We participated with single-document and multi-document corpus-based summarisers for both Arabic and English languages. The summarisers used word frequency lists and log likelihood calculations to generate single and multi document summaries. The single and multi summaries generated by our systems ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009